Goto

Collaborating Authors

 outcome variable


Participatory Personalization in Classification Supplementary Material

Neural Information Processing Systems

The performance of participatory systems will depend on individual reporting decisions. Thus, flat and sequential systems will perform better than a minimal system. The best-case performance of any participatory system will exceed the performance of any of its components. Given a participatory system, we can conduct this evaluation by simulating the parameters in the individual disclosure model shown above. The sequential system outperforms static personalized systems when all group attributes are reported.


Risks and Opportunities in Human-Machine Teaming in Operationalizing Machine Learning Target Variables

Guo, Mengtian, Gotz, David, Wang, Yue

arXiv.org Artificial Intelligence

Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2) performance-first: machines leading the process by recommending proxies based on predictive performance. Based on a controlled user study of a proxy construction task (N = 20), we show that the performance-first strategy facilitated faster iterations and decision-making, but also biased users towards well-performing proxies that are misaligned with the application goal. Our study highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables, yielding insights for future research to explore the opportunities and mitigate the risks.


On the Granularity of Causal Effect Identifiability

Chen, Yizuo, Darwiche, Adnan

arXiv.org Artificial Intelligence

The classical notion of causal effect identifiability is defined in terms of treatment and outcome variables. In this note, we consider the identifiability of state-based causal effects: how an intervention on a particular state of treatment variables affects a particular state of outcome variables. We demonstrate that state-based causal effects may be identifiable even when variable-based causal effects may not. Moreover, we show that this separation occurs only when additional knowledge -- such as context-specific independencies and conditional functional dependencies -- is available. We further examine knowledge that constrains the states of variables, and show that such knowledge does not improve identifiability on its own but can improve both variable-based and state-based identifiability when combined with other knowledge such as context-specific independencies. Our findings highlight situations where causal effects of interest may be estimable from observational data and this identifiability may be missed by existing variable-based frameworks.


Participatory Personalization in Classification Supplementary Material

Neural Information Processing Systems

The performance of participatory systems will depend on individual reporting decisions. Thus, flat and sequential systems will perform better than a minimal system. The best-case performance of any participatory system will exceed the performance of any of its components. Given a participatory system, we can conduct this evaluation by simulating the parameters in the individual disclosure model shown above. The sequential system outperforms static personalized systems when all group attributes are reported.


Subset Selection for Stratified Sampling in Online Controlled Experiments

Momozu, Haru, Uehara, Yuki, Nishimura, Naoki, Ohashi, Koya, Jobson, Deddy, Li, Yilin, Dinh, Phuong, Sukegawa, Noriyoshi, Takano, Yuichi

arXiv.org Machine Learning

Online controlled experiments, also known as A/B testing, are the digital equivalent of randomized controlled trials for estimating the impact of marketing campaigns on website visitors. Stratified sampling is a traditional technique for variance reduction to improve the sensitivity (or statistical power) of controlled experiments; this technique first divides the population into strata (homogeneous subgroups) based on stratification variables and then draws samples from each stratum to avoid sampling bias. To enhance the estimation accuracy of stratified sampling, we focus on the problem of selecting a subset of stratification variables that are effective in variance reduction. We design an efficient algorithm that selects stratification variables one by one by simulating a series of stratified sampling processes. We also estimate the computational complexity of our subset selection algorithm. Computational experiments using synthetic and real-world datasets demonstrate that our method can outperform other variance reduction techniques especially when multiple variables have a certain correlation with the outcome variable. Our subset selection method for stratified sampling can improve the sensitivity of online controlled experiments, thus enabling more reliable marketing decisions.


A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes

Alzboon, Mowafaq Salem, Al-Batah, Mohammad, Alqaraleh, Muhyeeddin, Abuashour, Ahmad, Bader, Ahmad Fuad

arXiv.org Artificial Intelligence

-- In many nations, diabetes is becoming a significant health problem, and early identi - fication and control are crucial. Using machine learning algorithms to predict diabetes has yielded encouraging results. Using the Pima Indians Dia - betes dataset, this study attempts to evaluate the efficacy of several machine - learning methods for diabetes prediction. The collection includes infor - mation on 768 patients, such as their ages, BMIs, and glucose levels. The techniques assessed are Logistic Regression, Decision Tree, Random Forest, k - Nearest Neighbors, Naive Bayes, Support Vector Machine, Gradient Boosting, and Neural Network. The findings indicate that the Neural Network algorithm performed the best, with an accuracy of 78.57 The study implies that machine learning algorithms can aid diabetes prediction and be an efficient early detection tool. Diabetes is a chronic metabolic disease af - fecting millions worldwide and is a significant cause of morbidity and death [1]. High blood glucose levels characterize the disorder and can result in some complications, including cardiovascular disease, stroke, blindness, and amputations. To prevent or postpone com - plications, diabetes must be recognized and treated as soon as feasible; however, this can be challenging because symptoms may be mild or absent [2]. Machine learning (ML) is a subfield of artificial intelligence that comprises the de - velopment of algorithms that can learn from data and generate inferences or predictions without being explicitly programmed. ML algorithms are beneficial in several fields, in - cluding healthcare.


Causal Estimation of Tokenisation Bias

Lesci, Pietro, Meister, Clara, Hofmann, Thomas, Vlachos, Andreas, Pimentel, Tiago

arXiv.org Artificial Intelligence

Modern language models are typically trained over subword sequences, but ultimately define probabilities over character-strings. Ideally, the choice of the tokeniser -- which maps character-strings to subwords -- should not affect the probability assigned to the underlying character-string; in practice, it does. We define this mismatch as tokenisation bias. In this work, we quantify one particular type of tokenisation bias: the effect of including or not a subword (e.g., $\langle hello \rangle$) in a tokeniser's vocabulary on the probability a trained model assigns to the corresponding characters (i.e., \textit{``hello''}). Estimating this effect is challenging because each model is trained with only one tokeniser. We address this by framing tokenisation bias as a causal effect and estimating it using the regression discontinuity design. Specifically, we exploit the fact that tokenisation algorithms rank subwords and add the first $K$ to a tokeniser's vocabulary, where $K$ is an arbitrary cutoff point. As such, we can estimate a causal effect by comparing similar subwords around this cutoff. Experimentally, we find that tokenisation consistently affects models' outputs across scales, vocabularies, and tokenisers. Notably, a subword's presence in a small model's vocabulary may increase its characters' probability by up to 17 times, highlighting tokenisation as a key design choice in language modelling.


An introduction to R package `mvs`

van Loon, Wouter

arXiv.org Machine Learning

In biomedical science, a set of objects or persons can often be described by multiple distinct sets of features obtained from different data sources or modalities (called "multi-view data"). Classical machine learning methods ignore the multi-view structure of such data, limiting model interpretability and performance. The R package `mvs` provides methods that were designed specifically for dealing with multi-view data, based on the multi-view stacking (MVS) framework. MVS is a form of supervised (machine) learning used to train multi-view classification or prediction models. MVS works by training a learning algorithm on each view separately, estimating the predictive power of each view-specific model through cross-validation, and then using another learning algorithm to assign weights to the view-specific models based on their estimated predictions. MVS is a form of ensemble learning, dividing the large multi-view learning problem into smaller sub-problems. Most of these sub-problems can be solved in parallel, making it computationally attractive. Additionally, the number of features of the sub-problems is greatly reduced compared with the full multi-view learning problem. This makes MVS especially useful when the total number of features is larger than the number of observations (i.e., high-dimensional data). MVS can still be applied even if the sub-problems are themselves high-dimensional by adding suitable penalty terms to the learning algorithms. Furthermore, MVS can be used to automatically select the views which are most important for prediction. The R package `mvs` makes fitting MVS models, including such penalty terms, easily and openly accessible. `mvs` allows for the fitting of stacked models with any number of levels, with different penalty terms, different outcome distributions, and provides several options for missing data handling.


What do people expect from Artificial Intelligence? Public opinion on alignment in AI moderation from Germany and the United States

Jungherr, Andreas, Rauchfleisch, Adrian

arXiv.org Artificial Intelligence

Recent advances in generative Artificial Intelligence have raised public awareness, shaping expectations and concerns about their societal implications. Central to these debates is the question of AI alignment -- how well AI systems meet public expectations regarding safety, fairness, and social values. However, little is known about what people expect from AI-enabled systems and how these expectations differ across national contexts. We present evidence from two surveys of public preferences for key functional features of AI-enabled systems in Germany (n = 1800) and the United States (n = 1756). We examine support for four types of alignment in AI moderation: accuracy and reliability, safety, bias mitigation, and the promotion of aspirational imaginaries. U.S. respondents report significantly higher AI use and consistently greater support for all alignment features, reflecting broader technological openness and higher societal involvement with AI. In both countries, accuracy and safety enjoy the strongest support, while more normatively charged goals -- like fairness and aspirational imaginaries -- receive more cautious backing, particularly in Germany. We also explore how individual experience with AI, attitudes toward free speech, political ideology, partisan affiliation, and gender shape these preferences. AI use and free speech support explain more variation in Germany. In contrast, U.S. responses show greater attitudinal uniformity, suggesting that higher exposure to AI may consolidate public expectations. These findings contribute to debates on AI governance and cross-national variation in public preferences. More broadly, our study demonstrates the value of empirically grounding AI alignment debates in public attitudes and of explicitly developing normatively grounded expectations into theoretical and policy discussions on the governance of AI-generated content.


The influence of missing data mechanisms and simple missing data handling techniques on fairness

Bhatti, Aeysha, Sandrock, Trudie, Nienkemper-Swanepoel, Johane

arXiv.org Machine Learning

Fairness of machine learning algorithms is receiving increasing attention, as such algorithms permeate the day-to-day aspects of our lives. One way in which bias can manifest in a dataset is through missing values. If data are missing, these data are often assumed to be missing completely randomly; in reality the propensity of data being missing is often tied to the demographic characteristics of individuals. There is limited research into how missing values and the handling thereof can impact the fairness of an algorithm. Most researchers either apply listwise deletion or tend to use the simpler methods of imputation (e.g. mean or mode) compared to the more advanced ones (e.g. multiple imputation); we therefore study the impact of the simpler methods on the fairness of algorithms. The starting point of the study is the mechanism of missingness, leading into how the missing data are processed and finally how this impacts fairness. Three popular datasets in the field of fairness are amputed in a simulation study. The results show that under certain scenarios the impact on fairness can be pronounced when the missingness mechanism is missing at random. Furthermore, elementary missing data handling techniques like listwise deletion and mode imputation can lead to higher fairness compared to more complex imputation methods like k-nearest neighbour imputation, albeit often at the cost of lower accuracy.